89 research outputs found
RTF-Based Binaural MVDR Beamformer Exploiting an External Microphone in a Diffuse Noise Field
Besides suppressing all undesired sound sources, an important objective of a
binaural noise reduction algorithm for hearing devices is the preservation of
the binaural cues, aiming at preserving the spatial perception of the acoustic
scene. A well-known binaural noise reduction algorithm is the binaural minimum
variance distortionless response beamformer, which can be steered using the
relative transfer function (RTF) vector of the desired source, relating the
acoustic transfer functions between the desired source and all microphones to a
reference microphone. In this paper, we propose a computationally efficient
method to estimate the RTF vector in a diffuse noise field, requiring an
additional microphone that is spatially separated from the head-mounted
microphones. Assuming that the spatial coherence between the noise components
in the head-mounted microphone signals and the additional microphone signal is
zero, we show that an unbiased estimate of the RTF vector can be obtained.
Based on real-world recordings, experimental results for several reverberation
times show that the proposed RTF estimator outperforms the widely used RTF
estimator based on covariance whitening and a simple biased RTF estimator in
terms of noise reduction and binaural cue preservation performance.Comment: Accepted at ITG Conference on Speech Communication 201
Instrumental and perceptual evaluation of dereverberation techniques based on robust acoustic multichannel equalization
Speech signals recorded in an enclosed space by microphones at a distance from the speaker are often corrupted by reverberation, which arises from the superposition of many delayed and attenuated copies of the source signal. Because reverberation degrades the signal, removing reverberation would enhance quality. Dereverberation techniques based on acoustic multichannel equalization are known to be sensitive to room impulse response perturbations. In order to increase robustness, several methods have been proposed, as for example, using a shorter reshaping filter length, incorporating regularization, or applying a sparsity-promoting penalty function. This paper focuses on evaluating the performance of these methods for single-source multi-microphone scenarios, using instrumental performance measures as well as using subjective listening tests. By analyzing the correlation between the instrumental and the perceptual results, it is shown that signal-based performance measures are more advantageous than channel-based performance measures to evaluate the perceptual speech quality of signals that were dereverberated by equalization techniques. Furthermore, this analysis also demonstrates the need to develop more reliable instrumental performance measures
Square root-based multi-source early PSD estimation and recursive RETF update in reverberant environments by means of the orthogonal Procrustes problem
Multi-channel short-time Fourier transform (STFT) domain-based processing of
reverberant microphone signals commonly relies on power-spectral-density (PSD)
estimates of early source images, where early refers to reflections contained
within the same STFT frame. State-of-the-art approaches to multi-source early
PSD estimation, given an estimate of the associated relative early transfer
functions (RETFs), conventionally minimize the approximation error defined with
respect to the early correlation matrix, requiring non-negative inequality
constraints on the PSDs. Instead, we here propose to factorize the early
correlation matrix and minimize the approximation error defined with respect to
the early-correlation-matrix square root. The proposed minimization problem --
constituting a generalization of the so-called orthogonal Procrustes problem --
seeks a unitary matrix and the square roots of the early PSDs up to an
arbitrary complex argument, making non-negative inequality constraints
redundant. A solution is obtained iteratively, requiring one singular value
decomposition (SVD) per iteration. The estimated unitary matrix and early PSD
square roots further allow to recursively update the RETF estimate, which is
not inherently possible in the conventional approach. An estimate of the said
early-correlation-matrix square root itself is obtained by means of the
generalized eigenvalue decomposition (GEVD), where we further propose to
restore non-stationarities by desmoothing the generalized eigenvalues in order
to compensate for inevitable recursive averaging. Simulation results indicate
fast convergence of the proposed multi-source early PSD estimation approach in
only one iteration if initialized appropriately, and better performance as
compared to the conventional approach
Measuring, modelling and predicting perceived reverberation
This paper investigates the relationship between the perceived level of reverberation and parameters measured from the room impulse response (RIR), as well as the design of an instrumental measure that predicts this perceived level. We first present the results of an experimental listening test conducted to assess the level of perceived reverberation in speech captured by a single microphone, before analysing the gathered data to assess the influence of parameters such as the reverberation time (T60) or the direct-to-reverberant ratio (DRR). Secondly, we use the results of this analysis to improve the signal based reverberation decay tail (RDT) measure, previously proposed by the authors to predict the perceived level of reverberation. The accuracy of the proposed measure is evaluated in terms of correlation with the subjective scores and compared to the performance of predictors using parameters extracted from the RIR. Results show that the proposed modifications to the RDT does improve its accuracy. Though still slightly outperformed by measures based on parameters of the RIR, we believe the proposed measure to be useful in scenarios in which the RIR or its parameters are unknown
Non-intrusive speech quality prediction using modulation energies and LSTM-network
Many signal processing algorithms have been proposed to improve the quality of speech recorded in the presence of noise and reverberation. Perceptual measures, i.e., listening tests, are usually considered the most reliable way to evaluate the quality of speech processed by such algorithms but are costly and time-consuming. Consequently, speech enhancement algorithms are often evaluated using signal-based measures, which can be either intrusive or non-intrusive. As the computation of intrusive measures requires a reference signal, only non-intrusive measures can be used in applications for which the clean speech signal is not available. However, many existing non-intrusive measures correlate poorly with the perceived speech quality, particularly when applied over a wide range of algorithms or acoustic conditions. In this paper, we propose a novel non-intrusive measure of the quality of processed speech that combines modulation energy features and a recurrent neural network using long short-term memory cells. We collected a dataset of perceptually evaluated signals representing several acoustic conditions and algorithms and used this dataset to train and evaluate the proposed measure. Results show that the proposed measure yields higher correlation with perceptual speech quality than that of benchmark intrusive and non-intrusive measures when considering various categories of algorithms. Although the proposed measure is sensitive to mismatch between training and testing, results show that it is a useful approach to evaluate specific algorithms over a wide range of acoustic conditions and may, thus, become particularly useful for real-time selection of speech enhancement algorithm settings
Optimal Binaural LCMV Beamforming in Complex Acoustic Scenarios: Theoretical and Practical Insights
Binaural beamforming algorithms for head-mounted assistive listening devices
are crucial to improve speech quality and speech intelligibility in noisy
environments, while maintaining the spatial impression of the acoustic scene.
While the well-known BMVDR beamformer is able to preserve the binaural cues of
one desired source, the BLCMV beamformer uses additional constraints to also
preserve the binaural cues of interfering sources. In this paper, we provide
theoretical and practical insights on how to optimally set the interference
scaling parameters in the BLCMV beamformer for an arbitrary number of
interfering sources. In addition, since in practice only a limited temporal
observation interval is available to estimate all required beamformer
quantities, we provide an experimental evaluation in a complex acoustic
scenario using measured impulse responses from hearing aids in a cafeteria for
different observation intervals. The results show that even rather short
observation intervals are sufficient to achieve a decent noise reduction
performance and that a proposed threshold on the optimal interference scaling
parameters leads to smaller binaural cue errors in practice.Comment: To appear in Proc. IWAENC 201
- …